Distributed Storage of RDF Based on Clustering

نویسندگان

  • Yonglin Leng
  • Fuyu Lu
چکیده

With the wide application of RDF(Resource Description Framework) data, the data volume grows rapidly. Therefore, RDF storage has become a hot research issue in data storage field currently. Distributed storage is an effective way to solve the storage and query of RDF data, and data partition is the premise of data distributed storage. In this paper we use graph clustering idea to realize the effective partition of RDF data. RDF can be described as a directed graph, so in this paper we use P-Rank (Penetrating Rank) algorithm to calculate the similarity of RDF graph node pairss, and then the improved K-means clustering algorithm is implemented to cluster the similarity results, so as to realize the distributed storage of RDF data. The experimental results show that, this method can complete the RDF data partition effectively, makes the intra-cluster similarity is smaller, and the larger the inter-cluster similarity. Keywords-RDF, directed graph, P-Rank, clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selforganisation in a Storage for Semantic Information

Scalable distributed semantic storage infrastructures are hard to realize. We propose the usage of principles of selforganization for the storage and retrieval of RDF triples. We use a biology-inspired algorithm for clustering of triples based on a purely syntactical similarity measure.

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Distributed Storage and Query of Large RDF Graphs

RDF tuples are the building blocks of the semantic web. As more data are expressed as RDF tuples, storage capabilities become important. The data set will become increasingly large such that it is necessary for data to be stored across multiple machines. Data set will be partitioned into smaller subsets, each containing an incomplete picture about data relationships. This has implications for q...

متن کامل

Gravimetric storage capacity of Hydrogen on C24H12 Coronene and its Si substituted at 298 K, a Monte Carlo Simulation

In this study, the radial distribution and gravimetric storage capacities of hydrogen on coronene (C24H12) and its Si substituted forms, C24H12, C24-nSinH12 (n= 4-24), have been investigated at 298 K and 0.1 MPa (standard situation) using (N,V,T) Monte Carlo simulation by Lennard-Jones (LJ) 12-6 potential. The results show that the increase of number of silicon substitution doesn’t have any eff...

متن کامل

Storage Balancing in P2P Based Distributed RDF Data Stores

Centralized RDF repositories have been designed to support RDF data storage and retrieval. However, they suffer from the traditional limitations of centralized approaches which are scalability and fault tolerance. Peer to Peer (P2P) networks can provide the scalability, fault-tolerance and robustness, features that the current solutions to local RDF storage do not provide which are needed by th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015